Scalable volumetric 3d reconstruction

ABSTRACT

Scalable volumetric reconstruction is described whereby data from a mobile environment capture device is used to form a 3D model of a real-world environment. In various examples, a hierarchical structure is used to store the 3D model where the structure comprises a root level node, a plurality of interior level nodes and a plurality of leaf nodes, each of the nodes having an associated voxel grid representing a portion of the real world environment, the voxel grids being of finer resolution at the leaf nodes than at the root node. In various examples, parallel processing is used to enable captured data to be integrated into the 3D model and/or to enable images to be rendered from the 3D model. In an example, metadata is computed and stored in the hierarchical structure and used to enable space skipping and/or pruning of the hierarchical structure.

BACKGROUND

Three dimensional reconstruction of surfaces in the environment is usedfor many tasks such as robotics, engineering prototyping, immersivegaming, augmented reality and others. For example, a moving capturedevice may capture images and data as it moves about in an environment;the captured information may be used to automatically compute avolumetric model of the environment such as a living room or an office.In other examples the capture device may be static whilst one or moreobjects move in relation to it. Existing systems for computingvolumetric 3D reconstructions of environments and/or objects aretypically limited in the size of the real world volume they are able toreconstruct. For example, due to memory and processing capacityconstraints and, for many applications, the desire to operate in realtime.

The embodiments described below are not limited to implementations whichsolve any or all of the disadvantages of known systems for computingvolumetric 3D reconstructions of environments and/or objects.

SUMMARY

The following presents a simplified summary of the disclosure in orderto provide a basic understanding to the reader. This summary is not anextensive overview of the disclosure and it does not identifykey/critical elements or delineate the scope of the specification. Itssole purpose is to present a selection of concepts disclosed herein in asimplified form as a prelude to the more detailed description that ispresented later.

Scalable volumetric reconstruction is described whereby data from amobile environment capture device is used to form a 3D model of areal-world environment. In various examples, a hierarchical structure isused to store the 3D model where the structure comprises a root levelnode, a plurality of interior level nodes and a plurality of leaf nodes,each of the nodes having an associated voxel grid representing a portionof the real world environment, the voxel grids being of finer resolutionat the leaf nodes than at the root node. In various examples, parallelprocessing is used to enable captured data to be integrated into the 3Dmodel and/or to enable images to be rendered from the 3D model. In anexample, metadata is computed and stored in the hierarchical structureand used to enable space skipping and/or pruning of the hierarchicalstructure.

In some examples the 3D model of the real-world environment is stored,either as a regular grid or using a hierarchical structure, and data ofthe 3D model is streamed between at least one parallel processing unitand one or more host computing devices.

In some examples a plurality of parallel processing units are used, eachhaving a memory storing at least part of the 3D model. For example, eachparallel processing unit uses the same amount of memory mapped todifferent physical dimensions in the real-world environment.

Many of the attendant features will be more readily appreciated as thesame becomes better understood by reference to the following detaileddescription considered in connection with the accompanying drawings.

DESCRIPTION OF THE DRAWINGS

The present description will be better understood from the followingdetailed description read in light of the accompanying drawings,wherein:

FIG. 1 is a schematic diagram of a 3D environment modeling system foruse with a mobile environment capture device;

FIG. 2 is a flow diagram of a method at the 3D environment modelingsystem of FIG. 1;

FIG. 3 is a schematic diagram of a hierarchical data structure forstoring a 3D model generated using the 3D environment modeling system ofFIG. 1;

FIG. 4 is a schematic diagram of part of the hierarchical data structureof FIG. 3;

FIG. 5 is a flow diagram of a method of forming a hierarchical datastructure such as that of FIG. 3;

FIG. 6 is a schematic diagram of memory at a parallel processing unitused to form the hierarchical data structure of FIG. 4;

FIG. 7 is a flow diagram of a method of integrating a depth map into thehierarchical data structure of FIG. 3;

FIG. 8 is a flow diagram of a method of summarization and pruning of ahierarchical data structure such as that of FIG. 3;

FIG. 9 is a flow diagram of a method of rendering;

FIG. 10 is a flow diagram of a method of integrating a depth map into adense 3D environment model;

FIG. 11 is a schematic diagram of an active region and a working set;

FIG. 12 is a flow diagram of a method of streaming;

FIG. 13 is a flow diagram of the streaming out part of the method ofFIG. 12 in more detail;

FIG. 14 is a schematic diagram of layered volumes in world space and ofa plurality of parallel computing devices used to represent the worldspace volumes;

FIG. 15 is a flow diagram of a method of integrating a depth map intolayered volumes;

FIG. 16 is a flow diagram of a method of streaming implemented forlayered volumes;

FIG. 17 is a flow diagram of another method of integrating a depth mapinto layered volumes;

FIG. 18 illustrates an exemplary computing-based device in whichembodiments of a 3D environment reconstruction system may beimplemented.

Like reference numerals are used to designate like parts in theaccompanying drawings.

DETAILED DESCRIPTION

The detailed description provided below in connection with the appendeddrawings is intended as a description of the present examples and is notintended to represent the only forms in which the present example may beconstructed or utilized. The description sets forth the functions of theexample and the sequence of steps for constructing and operating theexample. However, the same or equivalent functions and sequences may beaccomplished by different examples.

Although the present examples are described and illustrated herein asbeing implemented in a computing device having one or more graphicsprocessing units, the system described is provided as an example and nota limitation. As those skilled in the art will appreciate, the presentexamples are suitable for application in a variety of different types ofcomputing devices having parallel computing ability.

FIG. 1 is a schematic diagram of a 3D environment modeling system 110for use with a mobile environment capture device 100. Using the capturedimages and data 108 the 3D environment modeling system 110 is able toconstruct a detailed model 116 of 3D surfaces in its environment. Forexample the model may store enough information so that it may be used todepict exterior surfaces of a sports car showing curves, indentations,relief work, wing mirrors, handles and detailed surfaces of the sportscar engine (when the bonnet is open), its dashboard and interior. Inanother example, the surfaces may be floors, walls, bookshelves,staircases, light fittings, furniture and other objects inside abookshop. In another example the surfaces may be of shop fronts, lampposts, tree foliage and other objects on a street. The level of detailmay be such that individual keys of a keyboard may be discriminatedwhere a keyboard is in the environment being captured. Finer levels ofdetail may also be possible. The model captures how the surfaces arepositioned in the real world, so that it is possible to use the model tonavigate in the environment for example, or to project virtual realityobjects into the environment in a manner which takes into account thereal environment. The model may be imported into other systems, such asgames or computer aided design systems, to enable the model to be used.For example, to generate an entity in a computer game, such as a sportscar or to facilitate prototyping of sports car designs.

In the example illustrated in FIG. 1 a user operates the mobileenvironment capture device 100 which is handheld whilst moving in anenvironment such as any of: a space occupied by a sports car, a bookshopand a street. These are examples only; the mobile environment capturedevice 100 may be operated, by a human or an automated system, in anyenvironment in which its capture devices will operate effectively.Images and optionally other captured data 108 are transferred from themobile environment capture device 100 to a 3D environment modelingsystem 110. For example, by wired or wireless connection. In otherexamples the capture device 100 and the 3D environment modeling systemare integral. The 3D environment modeling system 110 is computerimplemented using one or more parallel computing units and at least onehost computing device. It comprises a 3D model generation system 112 forgenerating a 3D model 116 of the environment and/or objects. Itcomprises a real time tracker 114 for tracking a position andorientation (referred to as pose) of the mobile environment capturedevice 100. In some examples it comprises a streaming engine 118 forstreaming at least part of the 3D model 116 between one or more parallelcomputing units and a host computing device. In some examples itcomprises a layering system 120 for enabling the “viewing distance” tobe increased; that is to enable a greater depth range from the mobileenvironment capture device to be represented. This is useful where depthcameras with greater range are available.

As mentioned above, the 3D model 116 generated by the 3D environmentmodeling system 110 may be exported to a game system 124. That is, the3D model 116 and other data such as the camera pose from the real timetracker 114, the captured images and data 108 and other data may beinput to a downstream system 122 for ongoing processing. Examples ofdownstream systems 122 include but are not limited to: game system 124,augmented reality system 126, cultural heritage archive 128, roboticsystem 130. A cultural heritage archive may store 3D models of objectsand/or environments for record preservation and study.

The mobile environment capture device 100 comprises a depth camera whichis arranged to capture sequences of depth images of a scene. Each depthimage (or depth map frame) comprises a two dimensional image in whicheach image element (such as a pixel or group of pixels) comprises adepth value such as a length or distance from the camera to an object inthe captured scene which gave rise to that image element. This depthvalue may be an absolute value provided in specified units ofmeasurement such as meters, or centimeters or may be a relative depthvalue. In each captured depth image there may be around 300,000 or moreimage elements each having a depth value. The frame rate of the depthcamera is high enough to enable the depth images to be used for workingrobotics, computer game or other applications. For example, the framerate may be in the range of 20 to 100 frames per second.

The depth information may be obtained using any suitable techniqueincluding, but not limited to, time of flight, structured light, andstereo images. The mobile environment capture device 100 may alsocomprise an emitter arranged to illuminate the scene in such a mannerthat depth information may be ascertained by the depth camera.

The mobile environment capture device 100 also comprises one or moreprocessors, a memory and a communications infrastructure. It may beprovided in a housing which is shaped and sized to be hand held by auser or worn by a user. In other examples the mobile environment capturedevice is sized and shaped to be incorporated or mounted on a vehicle,toy or other movable apparatus. The mobile environment capture device100 may have a display device. For example, to display images renderedfrom the 3D model in order to enable a user to tell which areas of anenvironment are yet to be visited to capture data for the 3D model.

FIG. 2 is a flow diagram of a method at the 3D environment modelingsystem of FIG. 1 for integrating depth maps, from a stream of depth mapscaptured by the mobile environment capture device, into a dense 3D modelof the environment surfaces. In this way a dense 3D model of theenvironment surfaces is gradually built up as more depth maps arereceived from different camera viewpoints. The term “integration” isused here to refer to fusing or aggregating data from a current depthmap into the dense 3D model.

The mobile environment capture device computes 204 the current pose ofthe mobile capture device using real time tracker 114. For example, thecurrent pose may be computed using an iterative closest point processthat takes as input the current depth map and a corresponding depth maprendered 214 from the current 3D model 208 of the environment. Examplesof this type of method are described in detail in US patent publication20120196679 entitled “Real-Time Camera Tracking Using Depth Maps”Newcombe et al. filed on 31 Jan. 2011 and published on 2 Aug. 2012. Itis also possible for the current pose to be computed using a processwhere depth observations from a mobile depth camera are aligned withsurfaces of a 3D model of the environment in order to find an updatedposition and orientation of the mobile depth camera which facilitatesthe alignment. Examples of this type of method are described in U.S.patent application Ser. No. 13/749,497 entitled “Camera pose estimationfor 3D reconstruction” Sharp et al. which was filed on 24 Jan. 2013. Itis also possible to compute 204 the camera pose using other data. Forexample the mobile environment capture device 100 may have sensors totrack its pose such as a global positioning system, a compass, anaccelerometer or other similar sensors to enable pose to be tracked.Combinations of one or more of these or other ways of computing thecamera pose may be used.

The camera pose from the real time tracker may be in the form of a sixdegree of freedom (6DOF) pose estimate which indicates the location andorientation of the depth camera. In one example, the 6DOF pose estimatecan be in the form of an SE₃ matrix describing the rotation andtranslation of the depth camera relative to real-world coordinates. Moreformally, this transformation matrix can be expressed as:

$T_{k} = {\begin{bmatrix}R_{k} & t_{k} \\0^{\top} & 1\end{bmatrix} \in {SE}_{3}}$

Where T_(k) is the transformation matrix for depth image frame k, R_(k)is the camera rotation for frame k, t_(k) is the camera translation atframe k, and Euclidean group SE₃:={R, t|RεSO₃,tεR³}. Coordinates in thecamera space (i.e. from the camera perspective) can be mapped toreal-world coordinates by multiplying by this transformation matrix, andvice-versa by applying the inverse transform.

The 3D environment modeling system integrates 206 the current depth map200 into a dense 3D model of surfaces in the environment. This processmay begin with an empty 3D model which is gradually filled byaggregating information from captured depth map frames. This may beachieved as described in US patent publication 20120194516 entitled“Three-dimensional environment reconstruction” Newcombe et al. filed on31 Jan. 2011 and published on 2 Aug. 2012.

The resulting 3D model may be stored in a volume of memory at a parallelprocessing unit, for example, as a 3D voxel grid 210, where each voxelstores a numerical value which is a truncated signed distance functionvalue. This is described in US patent publication 20120194516 referencedabove and will be referred to herein as storing the 3D model as aregular grid. Where the 3D voxel grid 210 stores a truncated signeddistance function value at each voxel the capacity of the parallelprocessing unit memory of the 3D environment modeling system limits thevolume of real world space that may be represented.

The 3D voxel grid 210 can be visualized as a cuboid of memory, whereineach memory location is a voxel representing a point in space of theenvironment being modeled. Therefore the 3D grid directly represents aspatial portion of the real-world environment. As the 3D volumecorresponds directly to a real-world volume, the size of the real-worldvolume represented in a fixed-size memory determines the modelresolution. For example, if a large real-world volume is to be modeled,then each voxel of the memory represents a larger region in real-worldspace, and hence the resolution is lower than if a smaller real-worldvolume is modeled. If more memory is available, however, the largereal-world volume can be modeled at a higher resolution.

In various embodiments, a hierarchical data structure 212 is used tostore at least part of the 3D model 208 to enable much larger volumes ofreal world space to be reconstructed at the same level of detail, usingreduced memory capacity at a parallel processing unit, and enabling realtime operation. New processes for creating, filling, storing and usingexamples of hierarchical data structures in real time are describedbelow with reference to FIGS. 3 to 10. In these examples thehierarchical data structure achieves loss-less compression as comparedwith the regular grid 210 by using coarser nodes to represent free spacein the world and finer nodes to represent the signed distance functionnear surfaces. This takes into account the fact that, typically, thevast majority of the environment is empty so that in a regular grid 210most of the signed distance function is marked as free space.

Many different types of hierarchical data structure may be used such aspyramids or trees. For example, hierarchical data structures comprisingtrees which use spatial subdivision may be used as these enable a signeddistance function representing the 3D modeled surface to be stored andupdated as new depth maps arrive, without the need to completely rebuildthe hierarchical data structure as each depth map is taken into account.A tree data structure comprises a root node, one or more levels ofinterior or split nodes and a plurality of leaf nodes. Branches connectthe root node to first level interior nodes and connect interior levelnodes to the next level of the tree until the terminal nodes, calledleaf nodes, are reached. Data may be stored in the tree structure byassociating it with one or more of the nodes.

Hierarchical data structures with spatial subdivision comprise one ormore trees where branches of the trees divide real world spacerepresented by the 3D model. Many different spatial subdivisionstrategies are possible. Regular spatial subdivision strategies may beused rather than anisotropic ones, because the camera pose iscontinually updated. Regular spatial subdivision enables no assumptionsabout which way the user will move to be made. For example, although ananistropic grid may be well adapted for the camera when it is facing onedirection, once the user turns (for example, 90 degrees left), the gridof the 3D model is no longer aligned and poor sampling results.

Hierarchical data structures formed with regular spatial subdivision maybe built with any of a variety of different refinement strategies. Arefinement strategy comprises rules and/or criteria for deciding when tocreate branches from a node. With no refinement a dense regular grid isgenerated as shown at 210 in FIG. 2 which scales as O(n³) in storagewhere n is the resolution of one side of the grid. With full dyadicrefinement (i.e. a binary split along each axis giving 8 children foreach node) and data stored at the leaves a complete octree is formed.This gives a very deep hierarchy that may be complex to update andtraverse using a parallel processing unit such as a graphics processingunit. It is also possible to use different branching factors at eachlevel of each tree which is known as an N³ tree structure. Anotheroption is to use adaptive refinement whereby the signed distancefunction is represented at multiple resolutions by storing the value atdifferent levels of the tree and splitting a node when it can no longersummarize the variation within.

Empirical investigation of different hierarchical data structures foundthat trees with regular spatial subdivision, such as N³ trees withoutadaptive refinement give a good memory/performance trade-off. This typeof hierarchical data structure is now described with reference to FIG.3.

A 3D grid 300 similar to the 3D voxel grid 210 of FIG. 2 stores, insteadof a truncated signed distance function value at each voxel as in FIG.2, a record with an address of its child (if it has one) and, in someexamples, information about subtrees of that voxel in the hierarchicaldata structure. The record is stored with much less memory than atruncated signed distance function value is stored with. In this way theregular 3D grid 300 takes less memory than the 3D voxel grid 210 of FIG.2.

A subset of the voxels of the 3D grid 300 are near the surface of thesigned distance function as reconstructed so far. Each of the voxels inthis subset becomes a root node of a tree. In FIG. 3 three such voxelsare shown for clarity although in practice many more such voxels may bepresent. The way in which the subset of the voxels is selected isreferred to as a refinement strategy for deciding which of the voxels inthe grid will have a child node. In the example in FIG. 3 three voxelsof the root level have a child node and each of these child nodes isshown as a cube with half as many voxels along each edge as for the rootlevel grid as regular spatial subdivision is used. These level one nodes(also referred to as level one grids) 302, 304, 306 store, at eachvoxel, a record with an address of its child (if it has one) and, insome examples, information about sub-trees of that voxel in thehierarchical data structure. Each level one grid represents the samereal world volume as one root level voxel, but at a finer resolution.

In the example of FIG. 3 the hierarchical data structure has threelevels so that the second level nodes 308, 310, 312 are leaf nodes.However, it is also possible to use hierarchical data structures withtwo or more levels. A refinement strategy is used to select which of thelevel one voxels will have a child node. The refinement strategy may bethe same as the refinement strategy used at the previous level. Regularspatial subdivision is used and so each leaf node stores a 3D grid witha resolution specified by the user. In the example shown in FIG. 3 theleaf nodes have half as many voxels along each edge as for the firstlevel grids but this is an example; other resolutions may be used. Eachleaf level grid represents the same real world volume as one first levelvoxel, but at a finer resolution. Each leaf level voxel may store atruncated signed distance function value and a weight representing thefrequency of observations of that particular surface location obtainedfrom depth maps so far.

More detail of an example of using the hierarchical data structure ofFIG. 3 to represent a volumetric truncated signed distance function isnow given with reference to FIG. 4. This illustrates, in two dimensions,the three level hierarchical data structure of FIG. 3 with a root levelgrid 400, one first level node 406 and one leaf node 408 shown forclarity (although in practice there will be many more intermediate nodesand leaf nodes).

At the root level the 3D grid (shown in 2D in FIG. 4) has sixty fourvoxels. The camera frustum (the volume of real world space, mapped tothe 3D model space, which may potentially be sensed by the camera in itscurrent pose) is illustrated (in 2D rather than 3D) as triangle 402 withone corner “cut off” by line 401. The camera frustum is known from thecurrent camera pose and from calibrated characteristics of the camera.The current camera position (in model space) is at corner 403 oftriangle 402 and line 401 represents the plane in front of the cameraand beyond which surfaces may be sensed. Six voxels which are bothwithin the camera frustum and have a currently observed depth value(from the current depth map) which is near the truncated signed distancefunction (represented as line 404) are shaded. These six voxels meet therefinement strategy criteria. In this example, the other voxels of theroot level either have no child nodes or have child nodes generated fromprevious depth maps. The six voxels which meet the refinement strategycriteria have a level one child node created (unless one alreadyexists). For example, level one child node 406 is shown comprising a 3Dgrid which is represented in 2D in FIG. 4 as a four voxel sided square.The level one child nodes are created by allocating and clearing a placein memory at a parallel processing unit as described in more detailbelow. The memory is used to store a 3D grid of voxels representing asubdivision of the real world space represented by the parent voxel atthe root level.

Each level one child node descending from one of the six voxels whichmeet the refinement strategy criteria at level 0 is assessed accordingto the level 1 refinement strategy. For example, the level 1 node hasthree shaded voxels which meet the level 1 refinement strategy in FIG.4. For example, because these three voxels have an observed depth valuein the current depth map which is near the truncated signed distancefunction 404.

The three shaded voxels which meet the level 1 refinement strategy eachhave a leaf node created (unless one already exists). For example, leafnode 408 is shown comprising a 3D grid which is represented in 2D inFIG. 4 as a block of four voxels. Each of these voxels which meets aleaf level refinement strategy has a truncated signed distance functionvalue calculated together with a weight related to a frequency ofobservations of depth values for the real world surface locationcorresponding to the voxel. In various examples, a maximum of thecalculated signed distance function values is selected and stored ateach of the leaf level voxels which meets the refinement strategy. Thishelps to alleviate flickering near object edges in the signed distancefunction where cameras are used which tend to introduce noise at objectboundaries in depth maps.

In various examples the refinement strategy takes into account atruncation region around the truncated signed distance function. Thistruncation region is illustrated schematically in FIG. 4 by two thinlines around line 404. The refinement strategy may comprise checkingwhether a current depth value (converted to model space) falls within avoxel that intersects a truncation region around the existing modeledsigned distance function. In various examples the truncation regiontakes into account noise in the depth observations. For example, noisemay be modeled as a Gaussian distribution with variance related to depthin such a way that the depth (denoted by symbol z) uncertainty of adepth sample grows in relation to the square of the depth from thecamera. Therefore, in some examples, the truncation region is adaptivebecause it grows or shrinks in relation to the depth of the observationfrom the camera. By using an adaptive truncation region in this mannerincreased accuracy is found. However, it is not essential to useadaptive truncation as workable results are found with static truncationregions.

FIG. 5 is a flow diagram of a method of forming a hierarchical datastructure such as that of FIG. 3 which uses parallel processing in orderto facilitate real time operation. As mentioned above with reference toFIG. 4 the level one child nodes are created by allocating and cleaninga place in memory at a parallel processing unit. As child nodes arecreated at other levels of the tree a similar memory allocation andcleaning process occurs. In an example, memory is allocated in theparallel processing unit in advance and this memory is taken for use asnodes are created using a type of ticketing process. The ticketingprocess uses a free list and a backing store. A free list is a queue ofblock indices of blocks in the associated backing store. A backing storeis an array of fixed sized memory blocks where each block has size equalto a grid at a given level of the hierarchy.

The advance memory allocation comprises allocating 500 a root level gridin parallel processing unit memory and storing there a 3D array ofGridDesc records (one for each voxel of the root level grid),initialized to null. A GridDesc record stores a pointer to any childnode of the root level voxel and various other optional flags andinformation as described in more detail below.

The advance memory allocation may also comprise, for each level of thehierarchy (the number of levels is specified in advance) allocating 502a fixed size memory pool in parallel processing unit memory, with a freelist and a backing store.

As depth maps are received these are integrated 504 into thehierarchical data structure in a parallel processing process whichinvolves creating nodes of the hierarchical data structure where needed.This results in an updated hierarchical 3D model 508. A summarizationprocess 506 may optionally be performed on the hierarchical datastructure after each depth map integration, or at other intervals. Thesummarization process may also comprise a pruning process which removessub-trees of the hierarchical data structure where appropriate. Forexample, if sub trees are formed representing data which later becomesknown as noise or empty space.

FIG. 6 gives more detail about the hierarchical data structure withrespect to the GridDesc records used at the root and intermediate levelsand with respect to the free lists and backing stores. FIG. 6 shows, intwo dimensions, the three level heirarchical data structure of FIG. 3with a root level grid 400, one first level node 406 and one leaf node408 shown for clarity (although in practice there will be many moreintermediate nodes and leaf nodes).

One GridDesc record is shown for a single root level voxel which isshown in FIG. 6 as being near to the truncated signed distance function.The GridDesc record is repeated below:

Struct GridDesc

-   -   Bool nearSurface    -   Bool isDirty    -   Fixed16_tminWeight    -   Int poolindex=0

This pseudo code describes how a structure, called GridDesc, comprises aBoolean parameter field called “nearSurface” which is true if the voxel,or any voxels in a subtree from the voxel, are near the surface, ascurrently modeled. The test for being near the surface may use anadaptive truncation region as described above.

The structure comprises a Boolean parameter field called “isDirty” whichis true if the memory from the backing store which is to be used forholding the GridDesc record needs clearing.

The structure comprises a fixed point numerical value field called“fixed16_t minWeight” for storing a numerical value. At leaf nodes thenumerical value is a weight related to a frequency of observations ofdepth values occurring in of the part of the real world represented bythe voxel. At interior nodes and the root node, the numerical valuestores the minimum of the weights of its children.

The structure comprises an integer field called “poolIndex” whichrepresents an atomic operation for taking an item from the free list.The integer field poolIndex store a pointer to the node at the nextlevel down. It may be thought of as a ticket as described earlier inthis document.

To create the first level node 406 a free block is dequeued from thefree list 600 using an atomic operation, assigned to the poolIndex fieldof the GridDesc structure. The free list is a queue of block indices,initialized to full (the list [0, 1, . . . n]) where the symbol)indicates that n is not included in the list. In the example shown inFIG. 6 free block number 3 is at the head of the queue and is dequeuedby taking memory block 3 from backing store 602. The backing store is anarray of n fixed-sized blocks where each block has size equal to anentire grid at that level.

First level node 406 has its own GridDesc structure which has the samefields as described above. These are not shown in FIG. 6 for clarityexcept for the “int poolIndex” field which has the value 2 in thisexample, meaning that its child grid is at location 2 in the nextlevel's backing store.

Second level node 408 has an associated structure, which is differentfrom the GridDesc structure. In the example of FIG. 6 the leaf levelstructure is called struct TSDF and comprises a field storing a fixedpoint value which is a truncated signed distance function valueassociated with the voxel (referred to as fixed16_t distance in FIG. 6);and also comprising a field storing a fixed point value which is aweight associated with the frequency of depth observations received forthe voxel (referred to as fixed16_t weight in FIG. 6). The free list 604for level two (leaf level in this example) is shown in FIG. 6 as havingindex 2 dequeued from the head of the queue and block 2 from backingstore 606 used for the TSDF structure.

FIG. 7 is a flow diagram of a method of integrating a depth map into thehierarchical data structure of FIG. 3. An input depth map is received700 and an updated camera pose 702 is received from the real timetracker of FIG. 1. Using the updated camera pose 702 and cameracalibration information the camera frustum is calculated and applied tothe current root level grid of the hierarchical data structure. Rootlevel voxels in the root level grid are identified 704 which at leastpartly fall in the camera frustum and which are near the modeledsurface; or which meet other criteria (such as already having subtreeswith specified characteristics as described below).

The integration process may proceed in a top down manner. The processidentifies which root voxels are to be updated and puts these into aqueue. The process goes over the queue, doing the same for each level,until the leaves are reached. To identify root voxels to be updated, theprocess may look for root level voxels which touch the truncationregion, or already have children and are in front of some surface in thecurrent depth frame. An efficient way to do this is to project the rootvoxel to the screen, take its bounding box, and assign one thread toeach pixel in the bounding box. The bounding box may be conservativesuch that not every pixel is inside the projection of the voxel. Foreach pixel two tests may be carried out. One to check whether the pixelis inside the projection of the voxel; and one to check whether thepixel is inside the truncation region. If one or both checks are truethen the voxel is to be refined and it is placed in the queue.

Once the leaves are updates, the changes are summarized using a bottomup process. For example, where leaf nodes have been updated, a parentnodes of an updated child node can assess whether any of its child nodesare near the surface. If so, the parent node marks itself as such andtells its own parents.

In an example, one thread block is assigned 708 per identified rootlevel voxel. Each thread block comprises a plurality of executionthreads which may execute in parallel. For each identified root levelvoxel, its projection is rasterized using many threads to form the firstlevel nodes.

The process moves to the first level nodes. One thread may be assigned710 per first level node (also referred to as a grid). For each firstlevel grid, if the memory block from the backing store is dirty, theprocess uses threads of the thread block to co-operatively clear 712 thememory block.

For each first level grid, the process identifies those voxels for whichthere are one or more depth values (from the input depth map) which arenear the modeled surface; voxels which meet other criteria may also beidentified (such as those which already have children). To achieve thisone thread from the thread block may be used per voxel. Thus for eachfirst level grid, one thread from its thread block is used per voxel torasterize 714 that voxel's projection. This forms the second levelgrids.

The process of steps 710, 712, 714 may be repeated for other interiorlevels of the hierarchy until a leaf level is reached. For each leaflevel grid a thread block is assigned 718. The memory of the assignedthread block is cleared if needed as described above. One thread pervoxel is used to compute and store at the voxel a truncated signeddistance function value and optionally a weight. More detail about theprocess of computing and storing the truncated signed distance functionvalue and weight is given below with reference to FIG. 10.

In various examples, including the example of FIG. 7 above, a depth mapis integrated into the hierarchical data structure in breadth-firstorder. For the interior levels of the tree, including the root, theprocess conservatively rasterizes the footprint of the depth map intosuccessively finer voxel grids with recursion mediated by atomic queues.At the root grid, voxel indices are determined by conservativelyintersecting it with the bounding box of the camera frustum. Since rootvoxels project to large hexagons on screen, one thread block may beassigned per voxel at the root level and many threads used to rasterizeits projection. At interior levels, since voxels now project to smallerhexagons on screen, one thread block per grid is assigned, with onethread per voxel.

In an example, a process for integrating a depth map into thehierarchical data structure of FIG. 3 is given using the followingpseudocode:

  For each voxel v do in parallel If intersect(v, frustum) then  Bbox2D←boundingBox2D(project(v))  For all pixels p ∈∈ bbox2D do in parallel  z← depthMap[p]   overlaps← intersect(truncationRegion(z,σ(z),v) anyOverlaps←parallelReduce(overlaps)  if threadID = 0 then  desc←grid[v]   descend←(anyOverlaps or hasChildren(desc))   if descendthen    enqueue(jobQueue.v)    if !hasChildren(desc) then    desc.poolIndex←alloc()     desc.isDirty←true

The above pseudo code describes using a thread for each voxel of a rootlevel grid to carry out an integration process in parallel. Theintegration process involves checking if the voxel intersects the camerafrustum and if so, calculating a two dimensional bounding box Bbox2D byusing a function boundingBox2D with an argument project(v). For all thepixels in an input depth map which are a member of the 2D bounding boxthe process proceeds in parallel to look up the depth value z at thepixel and check if the depth value intersects with an adaptivetruncation region around the signed distance function at the voxel.

A parallel reduce operation is applied to remove duplicates from the setof overlaps (the set of voxels having pixels of the depth map whichintersect the adaptive truncation region).

If there is an available thread then the variable desc is set to thevoxel and the flag descend is set to true if the voxel has children orif there are any members of the overlaps set.

If the flag descend is set to true then a job is placed on the queue forvoxel v. Atomic job queues may be allocated in memory. When the processcalculates that a voxel is to be swept, its index is atomically enqueuedonto the job queue. To work on the next level, the process mayatomically dequeue voxel indicates from the input job queue.

If the voxel has no children then memory is allocated for a child of thevoxel and the isDirty flag is set if appropriate.

FIG. 8 is a flow diagram of a method of summarization and pruning (alsoreferred to as garbage collection) of a hierarchical data structure suchas that of FIG. 3. The summarization and pruning processes may usemetadata stored at the GridDesc records of the nodes. For example, thenearSurface flag of a node may be used to indicated whether any voxel ina subtree is potentially near the modeled surface. The nearSurface flagmay be used during raycasting to skip entire subtrees as described inmore detail below with reference to FIG. 9. The minWeight value may beused to identify subtrees that may be pruned as they represent freespace. This is now described in more detail with reference to FIG. 8.

Each leaf node is swept by parallel threads. For example, for each leafnode (also referred to as a leaf grid) in parallel, check 800 if anyleaf voxels are near the modeled surface and if so, update the parentgrid record by setting its nearSurface flag to true. In an example thecheck 800 comprises checking if any signed distance function values arenear the surface geometry; that is, checking if any signed distancefunction values have a magnitude less than the diagonal of a leaf voxel.A parallel reduction of the results of these checks for the leaf levelvoxels may be made and the result used to set the nearSurface flag ofthe parent node.

For each leaf node in parallel, find 802 the minimum observationfrequency weight and store that in the parent grid record. Parallelreduction may be used to find the minimum weight in a leaf grid.

Summarization proceeds 804 up the tree using the existing job queuesuntil the root level is reached.

The interior level grids (nodes) may then be pruned 806 on the basis ofthe grid records. For example, the minWeight field of the GridDescrecords is optionally used as a heuristic for garbage collection. If aninterior voxel has a sufficiently high minWeight and is not nearSurface,then it is unlikely to be nearSurface in the future and may be “frozen”as free space. An interior voxel identified on this basis may have itssubtree deleted in the next integration pass and integration for thisregion of real world space may be skipped in future.

FIG. 9 is a flow diagram of a method of rendering an image from the 3Dmodel in hierarchical form. The rendering process comprises raycastingmany rays from the desired output image elements (in real worldcoordinates) into the 3D model. The raycasting process may use spaceskipping on the basis of the metadata in the hierarchical data structureGridDesc records. This is now described with reference to FIG. 9. whichshows a parallelizable process for raycasting from the 3D model inhierarchical form, which is suited for execution on a GPU or multi-coreCPU in a similar manner to the model generation process above.

To render a view of the model, a pose of a virtual camera defining theviewpoint for the image to be rendered is firstly received 900. Thispose can be in the form of a 6DOF location and orientation of thevirtual camera. A separate execution thread is then assigned 902 to eachpixel in the image to be rendered.

The operations shown in box 904 are then performed by each executionthread to determine the value (e.g. shade, color etc.) to be applied tothe thread's associated pixel. The x- and y-coordinates for the pixelassociated with the thread are used with the pose of the virtual camerato convert 906 the pixel into real-world coordinates, denoted X, Y, Z.The real-world coordinates X, Y, Z can then be transformed 908 intovoxel coordinates in the 3D hierarchical model.

These coordinates define a point on a ray for the pixel having a pathemanating from the virtual camera location through the 3D hierarchicalmodel. It is then determined 910 which voxel in the 3D hierarchicalmodel root level grid is the first touched by this ray, and this is setas the starting voxel for the raycasting. The raycasting operationtraverses the tree 912 in a depth first search manner to retrieve asigned distance function value for this location. This is done bychecking if the nearSurface flag is set to true. If so, the processmoves down the tree in the same manner until a leaf node is reached. Ifat any point the nearSurface flag is set to false, the process movesback up the tree in a depth first search manner along the ray. Thisenables space skipping to occur by using the nearSurface flag metadata.

When a leaf node is reached a check is made for a zero-crossing. If nozero-crossing is found the process moves back up the tree to the parentnode and continues with any other child nodes of that parent node in adepth first search manner.

If a zero crossing is found (i.e. a sign change between the averagedsigned distance function values stored in one voxel on the ray at theleaf level to the next voxel along the ray at the leaf level), theprocess calculates 916 a surface normal at the zero crossing.Optionally, the zero crossing check process can be arranged to determinethe presence of a sign-change only from positive through zero tonegative. This enables a distinction to be made between surfaces viewedfrom the front and surfaces viewed from “inside” the object.

When a zero-crossing is detected, this indicates the presence of asurface in the model. Therefore, this indicates the leaf level voxel atwhich the surface intersects the ray. In one example, the surfaceintersection point along a ray can be computed using a simple linearinterpolation given trilinearly sampled points either side of thedetected zero crossing to find the point at which a zero occurs. At thepoint at which the zero-crossing occurs, a surface normal is calculated916. This can be performed by taking truncated signed distance functiondifferences with neighboring voxels. This estimates a gradient which isthe surface normal. In one example, the surface normal can be computedusing a backward difference numerical derivative, as follows:

${{\hat{n}(x)} = \frac{\nabla{f(x)}}{{\nabla{f(x)}}}},{{\nabla f} = \lbrack {\frac{\partial f}{\partial x},\frac{\partial f}{\partial y},\frac{\partial f}{\partial z}} \rbrack^{\top}}$

Where {circumflex over (n)}(x) is the normal for at point x, and ƒ(x) isthe signed distance function value for voxel x. This derivative can bescaled in each dimension to ensure correct isotropy given potentiallyarbitrary voxel resolutions and reconstruction dimensions.

The process may cache and reuse the tree traversal from the currentposition on the ray to enable performance at step 912 to be improved. Tocompute a surface normal using differences with neighbors, the processuses multiple accesses. The neighbors are likely to be in the same gridas the initial point, so the process is able to cache which grid it isin and reuse it when appropriate.

The coordinates of the voxel at which the zero-crossing occurs areconverted 918 into real-world coordinates, giving the real-worldcoordinates of the location of surface in the model. From the real-worldcoordinates of the surface, plus its surface normal, a shade and/orcolor can be calculated 920. The calculated shade and/or color can bebased on any suitable shading model, and take into account the locationof a virtual light source.

As mentioned, the operations in box 904 are performed by each executionthread in parallel, which gives a shade and/or color for each pixel inthe final output image. The calculated data for each pixel can then becombined to give an output image 922, which is a rendering of the viewof the model from the virtual camera.

In an example, the process of step 912 of FIG. 9 may be implemented asfollows. The process maintains as state a previous distance along theray t_(p) (which is the distance along the ray to the previous rootlevel voxel), a previous signed distance function value d_(p) (from thetree traversal at the previous root level voxel), and a stack of voxelindices down the hierarchy. The value of t_(p) is set to zero (at thecamera viewpoint) and the tree is traversed to retrieve the currentdepth d_(p) At each iteration the process steps to the next voxel at thecurrent level. If at an interior node and the nearSurface flag is set,the process finds the closest voxel at the next level down and pushesthat onto the stack. Otherwise the process does nothing. If the processis at a leaf node, then a test is made whether there is a zero crossing.A zero crossing occurs when d_(p)>0 and d_(c)<0. If a zero crossing isfound the surface is at

$t_{z} = {t_{p} + \frac{d_{p}}{d_{p\text{-}d_{c}}}}$

Otherwise the process sets d_(p)=d_(c) and continues. If the processsteps outside the bounds of the current grid the stack is popped so asto move back up the tree.

FIG. 10 is a flow diagram of a method computing and storing a truncatedsigned distance function value and gives more detail of the process ofstep 718 of FIG. 7. A signed distance function calculation gives thevalue of the distance between the current voxel and the correspondingpoint in the depth image and is signed such that voxels outside (i.e.external to) the corresponding point in the depth image (from thecamera's perspective) are given a positive distance, and voxels inside(i.e. internal to) the corresponding point in the depth image (from thecamera's perspective) are given a negative distance. A value of zeroindicates that the associated voxel is exactly coincident with thecorresponding point. The signed distance function can be calculatedreadily from the depth value in the depth image at a locationcorresponding to the center of the voxel, minus the depth axis locationof the center of the voxel. It is possible to use the center of thevoxel where it is possible to assume that leaf voxels are smaller than apixel of the input depth maps.

The signed distance function value may be normalized 1022 to apredefined distance value. In one example, this predefined value can bea small distance such as 5 cm, although any suitable value can be used.For example, the normalization can be adapted depending on the noiselevel and the thickness of the object being reconstructed. This can bedefined manually by the user, or derived automatically though analysisof the noise in the data. It is then determined 1024 whether thenormalized distance is greater than a positive threshold value (if thesigned distance is positive) or less than a negative threshold value (ifthe signed distance is negative). If so, then the signed distancefunction values are truncated 1026 to maximum or minimum values. Forexample, if the normalized distance is greater than the positivethreshold value, then the value can be truncated at +1 (the positivethreshold value after normalizing), and if the normalized distance isless than the negative threshold value, then the value can be truncatedat −1 (the negative threshold value after normalizing). The result ofthis calculation is known as a truncated signed distance function(TSDF).

The normalized (and if appropriate, truncated) signed distance functionvalue is then combined with any previous value stored at the currentvoxel. In the case that this is the first depth image incorporated intothe 3D model, then no previous values are present. However, as furtherframes from the depth camera are received and incorporated, then valuescan already be present at a voxel.

In one example, the signed distance function value is combined with aprevious value by averaging 1028. This can assist with building modelsof environments with moving objects, as it enables an object that hasmoved to disappear over time as the measurement that added it becomesolder and averaged with more recent measurements. For example, anexponentially decaying moving average can be used. In another example,the average can be a weighted average that uses a weighting functionrelating to the distance of the associated voxel from the depth camera.The averaged signed distance function values can then be stored 1030 atthe current voxel.

In another example, two values can be stored at each leaf voxel. Aweighted sum of the signed distance function values can be calculatedand stored, and also a sum of the weights calculated and stored. Theweights may be frequencies of depth observations. The weighted averagecan then be computed as (weighted sum)/(sum of weights).

Using a hierarchical structure as described above enables interactivereconstruction of relatively large volumes. For example, at 1024³resolution, (4 m)³ with (4 mm)³ voxels or (8 m)³ with (8 mm)³ voxels. Tofurther scale to unbounded physical dimensions the 3D environmentmodeling system may decouple the physical volume from the working set.This is also applicable where a 3D grid is used rather than ahierarchical structure.

A working set is parts of memory that an algorithm is currently using.In the examples where graphics processing units are used the working setmay be parts of GPU memory currently being used by the 3D environmentmodeling system or rendering system. In examples, a working set may bedefined as a set of fixed 3D array indices in GPU memory which is equalto a root grid resolution of the hierarchical structure. In embodimentswhere the 3D model is stored using a regular grid (without ahierarchical structure) the working set may be defined as a set of fixed3D array indices in GPU memory which is equal to the 3D grid resolution.

A resolution (the number of voxels) at each level of the hierarchicalstructure may be specified together with a leaf level voxel size inmeters. These parameters multiply to determine the physical size of aroot voxel in meters. A world coordinate system may be quantized intounits of root voxels which serve as keys indexing subtrees of thehierarchy.

An active region may be defined as a cubical (or other shaped) subset ofthe world coordinate system (in meters) that is centered on the camera'sview frustum, but whose origin is quantized to a root voxel in theworld. To ensure zero contention, the active region's effectiveresolution may be one root voxel less than that of the working set alongeach axis. This enables mapping voxels of the active region to indicesof the working set using modular arithmetic.

FIG. 11 is a schematic diagram of an active region and a working set intwo dimensions. Active regions 1100, 1102, 1104, 1106 are shown asgrids. For active region 1104 the working set is depicted by the cellsof the grid which contain dots. For active region 1106 the working setis also depicted by the cells of the grid which contain dots. Activeregions 1100 and 1102 are shown in relation to a world coordinatesystem. These active regions are associated with different camerapositions; each camera frustum is depicted using a triangle as in FIG.4. Each cell of the active region grids in FIG. 11 corresponds to a rootvoxel in the world.

The active region and the working set may be used to identify indices ofthe 3D model which may be streamed between the parallel processing unitmemory and memory at the host computing device. Indices may be streamedout from GPU memory to the host or vice versa. For example, in FIG. 11active region 1100 corresponds to active region 1104 and represents thesituation for an old camera position. Active region 1102 corresponds toactive region 1106 and represents a situation for a new camera position.Considering old active region 1104 the blank cells represent indices (orsubtrees) which may be streamed out to the host as these are outside theworking set. Considering new active region 1106 the subtrees to bestreamed out are represented by cells filled with dots in grid 1108 andthe subtrees to be streamed out are represented by cells filled withdots in grid 1110.

FIG. 12 is a flow diagram of a method of streaming 3D model data betweenmemory at a parallel processing unit and memory at a host computingdevice. A camera pose is received 1200, for example, from real timetracker 114 of FIG. 1. An active region is calculated 1202 or updatedusing the received camera pose on the basis of the definition of anactive region given above and knowledge of the resolution of the 3Dmodel (as a regular grid or as a hierarchical structure). The activeregion is mapped 1204 to the working set. Using the knowledge of theactive region and the working set mapping, working set indices to bestreamed out are selected 1206 and working set indices to have datastreamed in are selected 1208. For streaming out these may be workingset indices which have become absent from the new active region sincethe previous active region. For streaming in these may be working setindices which are now present in the new active region and were absentin the previous active region.

Compression criteria may also be used during the selection 1206 ofworking set indices for streaming out. If a hierarchy is being used (seedecision point 1210) then subtrees of the selected working set indicesmay be converted 1216 to depth first storage and streamed to the host.If a hierarchy is not being used the selected voxel values are streamedout 1212.

During streaming in, if a hierarchy is being used (see decision point1210) subtrees are accessed from the host and restored 1218 to thehierarchical data structure. If a hierarchy is not being used theprocess streams 1214 in voxel values from the host.

In an example described with reference to FIG. 13, streaming from GPU tohost uses two breadth-first traversals of the hierarchy. Given a set ofworking set indices to stream out on the host, the process copies 1300these into a GPU queue and performs a tree traversal 1302 to determinehow much space is needed for each subtree (using parallel reduction tocompute the sum). The process performs a parallel prefix scan to compute1304 offsets into a linear buffer where each subtree may be stored. Atree traversal 1306 is made to write each voxel into the linear buffer,replacing poolIndex with a byte offset from the beginning of eachsubtree. This operation converts a forest (in the form of a list oftrees) from breadth first storage to depth first storage. The linearbuffer and list of offsets may be copied 1308 to the host and eachsubtree stored 1310 in a dictionary. Streaming from host to GPU may beanalogous.

In some examples a layered volumes scheme is used to enable largerscanning and viewing distances by using multiple graphics processors orother parallel processing units. The layered volumes scheme may be usedwhere the 3D model is stored as either a regular grid, or as ahierarchical structure.

For example, FIG. 14 shows three GPUs (GPU0, GPU1 and GPU2) each used tostore a 3D model or part of a 3D model reconstructed by the 3Denvironment modeling system described herein (or any other suitable 3Denvironment modeling system). GPU0 is used to represent world space 31400, GPU1 is used to represent world space 2 1402 and GPU 2 is used torepresent world space 1 1404. Triangle 1406 represents a camera frustum.The world spaces each have different physical dimensions. For example,world space 3 may be larger than world space 2 and world space 2 may belarger than world space 1. The world spaces may be centered on the samephysical location so that world space 3 contains world space 2 whichcontains world space 1. The memory used at each of the GPUs may be thesame. In this way GPU1 captures a coarse scale surface geometry, GPU1captures an intermediate scale surface geometry and GPU2 captures a finescale surface geometry.

FIG. 15 is a flow diagram of a method of integrating a depth map intolayered volumes, such as the layered volumes of FIG. 14. A current depthmap is received 1500 and the current camera pose is computed 1502. Thecurrent depth map is integrated into each of volumes 0, 1 and 2 at steps1504, 1506 and 1508 respectively. A depth map integration process asdescribed above may be used according to whether the 3D model is aregular grid or a hierarchical structure.

To render an image from the 3D model a raycasting process (such asdescribed herein) may be applied 1510, 1512, 1514 to each volumeseparately and in parallel. The raycasting results are then blended 1516or aggregated. The raycasting results may be fed back for use in thecamera pose computation in some examples.

Where layered volumes are used it is possible to apply streaming. Forexample, a camera pose is received 1600 and the active region is updated1602 as described above. The active region is mapped to a working setfor each volume 1604 and this enables identification 1606 of data to bestreamed in or out from the volume. Streaming takes place 1608bidirectionally for each volume independently and in parallel.

FIG. 17 is a flow diagram of another method of integrating a depth mapinto layered volumes. A depth map is received 1700 and an associatedcamera pose is computed 1702. The depth map is integrated into theinnermost volume 1704 and streaming 1706 is applied to the innermostvolume. During streaming out the process populates 1708 coarser volumeswith aggregated data from finer volumes.

In an example, an apparatus for constructing a 3D model of a real-worldenvironment comprises:

an input interface arranged to receive a stream of depth maps of thereal-world environment captured by a mobile environment capture device;

at least one parallel processing unit arranged to calculate, from thedepth maps, a 3D model comprising values representing surfaces in thereal-world environment;

a memory at the parallel processing unit arranged to store the 3D modelin a hierarchical structure comprising a root level node, a plurality ofinterior level nodes and a plurality of leaf nodes, each of the nodeshaving an associated voxel grid representing a portion of the real worldenvironment, the voxel grids being of finer resolution at the leaf nodesthan at the root node;

the parallel processing unit arranged to compute and store, at the rootand interior nodes, metadata describing the hierarchical structure, andto compute and store at the leaf nodes, the values representingsurfaces.

For example, the parallel processing unit is arranged to form interiornodes and leaf nodes by allocating memory blocks using atomic queues.

For example, the parallel processing unit is arranged to form interiornodes and leaf nodes on the basis of a refinement strategy which takesinto account distances of depth observations from surfaces modeled bythe 3D model.

For example, the apparatus has the parallel processing unit being atleast partially implemented using hardware logic selected from any oneor more of: a field-programmable gate array, a program-specificintegrated circuit, a program-specific standard product, asystem-on-a-chip, a complex programmable logic device, a graphicsprocessing unit

Alternatively, or in addition, the functionality described herein can beperformed, at least in part, by one or more hardware logic components.For example, and without limitation, illustrative types of hardwarelogic components that can be used include Field-programmable Gate Arrays(FPGAs), Program-specific Integrated Circuits (ASICs), Program-specificStandard Products (ASSPs), System-on-a-chip systems (SOCs), ComplexProgrammable Logic Devices (CPLDs), Graphics Processing Units (GPUs).

FIG. 18 illustrates various components of an exemplary computing-baseddevice 1800 which may be implemented as any form of a computing and/orelectronic device, and in which embodiments of the above described 3Dmodeling techniques may be implemented.

Computing-based device 1800 comprises one or more processors 1802 whichmay be microprocessors, controllers or any other suitable type ofprocessors for processing computer executable instructions to controlthe operation of the device in order to perform 3D reconstruction. Insome examples, for example where a system on a chip architecture isused, the processors 1802 may include one or more fixed function blocks(also referred to as accelerators) which implement a part of the methodof the 3D modeling, rendering, or streaming methods in hardware (ratherthan software or firmware).

The computing-based device 1800 also comprises a graphics processingsystem 1804 which communicates with the processors 1802 via acommunication interface 1806, and comprises one or more graphicsprocessing units 1808, which are arranged to execute parallel, threadedoperations in a fast and efficient manner. The graphics processingsystem 1804 also comprises a memory device 1810, which is arranged toenable fast parallel access from the graphics processing units 1808. Inexamples, the memory device 1810 can store the 3D model, and thegraphics processing units 1808 can perform the model generation andraycasting operations described above.

The computing-based device 1800 also comprises an input/output interface1812 arranged to receive input from one or more devices, such as themobile environment capture device (comprising the depth camera), andoptionally one or more user input devices (e.g., a game controller,mouse, and/or keyboard). The input/output interface 1812 may alsooperate as a communication interface, which can be arranged tocommunication with one or more communications networks (e.g. theInternet).

A display interface 1814 is also provided and arranged to provide outputto a display system integral with or in communication with thecomputing-based device. The display system may provide a graphical userinterface or other user interface of any suitable type although this isnot essential.

The computer executable instructions may be provided using anycomputer-readable media that is accessible by computing based device1800. Computer-readable media may include, for example, computer storagemedia such as memory 1816 and communications media. Computer storagemedia, such as memory 1816, includes volatile and non-volatile,removable and non-removable media implemented in any method ortechnology for storage of information such as computer readableinstructions, data structures, program modules or other data. Computerstorage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM,flash memory or other memory technology, CD-ROM, digital versatile disks(DVD) or other optical storage, magnetic cassettes, magnetic tape,magnetic disk storage or other magnetic storage devices, or any othernon-transmission medium that can be used to store information for accessby a computing device. In contrast, communication media may embodycomputer readable instructions, data structures, program modules, orother data in a modulated data signal, such as a carrier wave, or othertransport mechanism. As defined herein, computer storage media does notinclude communication media. Therefore, a computer storage medium shouldnot be interpreted to be a propagating signal per se. Propagated signalsmay be present in a computer storage media, but propagated signals perse are not examples of computer storage media. Although the computerstorage media (memory 1816) is shown within the computing-based device1800 it will be appreciated that the storage may be distributed orlocated remotely and accessed via a network or other communication link(e.g. using communication interface 1812).

Platform software comprising an operating system 1818 or any othersuitable platform software may be provided at the computing-based deviceto enable application software 1820 to be executed on the device. Thememory 1816 can store executable instructions to implement thefunctionality of a dense model integration engine 1822 (e.g. arranged tobuild up the model in the 3D model using the process described withreference to FIG. 7), a dense model visualization engine 1824 (e.g.arranged to output a rendered image of the model using the raycastingprocess of FIG. 9), and a dense model query engine 1826 (arranged to getdata from the model, e.g. for constructing a polygon mesh). The memorycan also provide a data store 1830, which can be used to provide storagefor data used by the processors 1802 when performing the 3D modelingtechniques, such as for storing a polygon mesh. The data store 1830 mayalso store data streamed out from the 3D model. The data store 1830 maystore parameter values, user settings, depth maps, rendered images andother data. The memory 1816 may store executable instructions toimplement the functionality of a camera tracking engine 1828 fortracking pose of a mobile environment capture device. The memory 1816may store executable instructions to implement the functionality of astreaming engine 1832 in examples where data is streamed into or out ofthe 3D model, for example, as described with reference to FIGS. 11-13.

Any of the input/output controller 1812 and the display interface 1814may comprise NUI technology which enables a user to interact with thecomputing-based device in a natural manner, free from artificialconstraints imposed by input devices such as mice, keyboards, remotecontrols and the like. Examples of NUI technology that may be providedinclude but are not limited to those relying on voice and/or speechrecognition, touch and/or stylus recognition (touch sensitive displays),gesture recognition both on screen and adjacent to the screen, airgestures, head and eye tracking, voice and speech, vision, touch,gestures, and machine intelligence. Other examples of NUI technologythat may be used include intention and goal understanding systems,motion gesture detection systems using depth cameras (such asstereoscopic camera systems, infrared camera systems, rgb camera systemsand combinations of these), motion gesture detection usingaccelerometers/gyroscopes, facial recognition, 3D displays, head, eyeand gaze tracking, immersive augmented reality and virtual realitysystems and technologies for sensing brain activity using electric fieldsensing electrodes (EEG and related methods).

The term ‘computer’ or ‘computing-based device’ is used herein to referto any device with processing capability such that it can executeinstructions. Those skilled in the art will realize that such processingcapabilities are incorporated into many different devices and thereforethe terms ‘computer’ and ‘computing-based device’ each include PCs,servers, mobile telephones (including smart phones), tablet computers,set-top boxes, media players, games consoles, personal digitalassistants and many other devices.

The methods described herein may be performed by software in machinereadable form on a tangible storage medium e.g. in the form of acomputer program comprising computer program code means adapted toperform all the steps of any of the methods described herein when theprogram is run on a computer and where the computer program may beembodied on a computer readable medium. Examples of tangible storagemedia include computer storage devices comprising computer-readablemedia such as disks, thumb drives, memory etc. and do not includepropagated signals. Propagated signals may be present in a tangiblestorage media, but propagated signals per se are not examples oftangible storage media. The software can be suitable for execution on aparallel processor or a serial processor such that the method steps maybe carried out in any suitable order, or simultaneously.

This acknowledges that software can be a valuable, separately tradablecommodity. It is intended to encompass software, which runs on orcontrols “dumb” or standard hardware, to carry out the desiredfunctions. It is also intended to encompass software which “describes”or defines the configuration of hardware, such as HDL (hardwaredescription language) software, as is used for designing silicon chips,or for configuring universal programmable chips, to carry out desiredfunctions.

Those skilled in the art will realize that storage devices utilized tostore program instructions can be distributed across a network. Forexample, a remote computer may store an example of the process describedas software. A local or terminal computer may access the remote computerand download a part or all of the software to run the program.Alternatively, the local computer may download pieces of the software asneeded, or execute some software instructions at the local terminal andsome at the remote computer (or computer network). Those skilled in theart will also realize that by utilizing conventional techniques known tothose skilled in the art that all, or a portion of the softwareinstructions may be carried out by a dedicated circuit, such as a DSP,programmable logic array, or the like.

Any range or device value given herein may be extended or alteredwithout losing the effect sought, as will be apparent to the skilledperson.

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts described above are disclosed asexample forms of implementing the claims.

It will be understood that the benefits and advantages described abovemay relate to one embodiment or may relate to several embodiments. Theembodiments are not limited to those that solve any or all of the statedproblems or those that have any or all of the stated benefits andadvantages. It will further be understood that reference to ‘an’ itemrefers to one or more of those items.

The steps of the methods described herein may be carried out in anysuitable order, or simultaneously where appropriate. Additionally,individual blocks may be deleted from any of the methods withoutdeparting from the spirit and scope of the subject matter describedherein. Aspects of any of the examples described above may be combinedwith aspects of any of the other examples described to form furtherexamples without losing the effect sought.

The term ‘comprising’ is used herein to mean including the method blocksor elements identified, but that such blocks or elements do not comprisean exclusive list and a method or apparatus may contain additionalblocks or elements.

It will be understood that the above description is given by way ofexample only and that various modifications may be made by those skilledin the art. The above specification, examples and data provide acomplete description of the structure and use of exemplary embodiments.Although various embodiments have been described above with a certaindegree of particularity, or with reference to one or more individualembodiments, those skilled in the art could make numerous alterations tothe disclosed embodiments without departing from the spirit or scope ofthis specification.

1. A computer-implemented method comprising: receiving, at a processor,a stream of depth maps of the real-world environment captured by amobile environment capture device; calculating, from the depth maps, a3D model comprising values representing surfaces in the real-worldenvironment; storing the 3D model in a hierarchical structure comprisinga root level node, a plurality of interior level nodes and a pluralityof leaf nodes, each of the nodes having an associated voxel gridrepresenting a portion of the real world environment, the voxel gridsbeing of finer resolution at the leaf nodes than at the root node;storing, at the root and interior nodes, metadata describing thehierarchical structure; storing at the leaf nodes, the valuesrepresenting surfaces.
 2. A method as claimed in claim 1 wherein storingthe 3D model in a hierarchical structure comprises forming the interiorlevel nodes and the leaf nodes on the basis of a refinement strategywhich checks whether a depth observation from a depth map is near to atleast some of the values representing surfaces in the real-worldenvironment.
 3. A method as claimed in claim 2 wherein the refinementstrategy checks whether a depth observation from a depth map is near toat least some of the values by using a truncation region which adaptsaccording to the depth observation from the mobile environment capturedevice.
 4. A method as claimed in claim 1 wherein storing the 3D modelin a hierarchical structure comprises forming, in parallel, interiornodes for selected voxels of the voxel grid of the root node, by using athread block for each of the selected voxels.
 5. A method as claimed inclaim 1 wherein storing the 3D model in a hierarchical structurecomprises forming, in parallel, a child node for each of selected voxelsof voxel grids of interior nodes, by using one thread per selected voxelof an interior node.
 6. A method as claimed in claim 1 wherein storingthe 3D model in a hierarchical structure comprises allocating, for eachof a plurality of levels of the hierarchical structure, a fixed sizememory pool.
 7. A method as claimed in claim 6 wherein each fixed sizememory pool comprises a backing store which is a plurality of memoryblocks each sized according to a voxel grid size used at a level of thehierarchy, and a free list, which is a queue of indices of the backingstore memory blocks.
 8. A method as claimed in claim 7 wherein storingthe 3D model in a hierarchical structure comprises forming interior andleaf nodes by using memory blocks from the backing store according tothe free lists.
 9. A method as claimed in claim 1 wherein the metadatacomprises a near surface flag indicating whether at least one depthobservation associated with a node is near to at least some of thevalues representing surfaces in the real-world environment.
 10. A methodas claimed in claim 1 wherein the metadata comprises a minimum weightvalue related to a minimum number of depth observations associated witha node.
 11. A method as claimed in claim 1 comprising, computing andstoring the metadata by traversing the hierarchical data structure fromeach of the leaf nodes in parallel to the root level node.
 12. A methodas claimed in claim 1 comprising, for each leaf node, checking, inparallel, each voxel of the leaf node voxel grid, by comparing the valuestored at the leaf node voxel with a threshold, and setting a nearsurface flag of a parent node of the leaf node according to the resultsof the checks.
 13. A method as claimed in claim 1 comprising pruning thehierarchical structure by removing nodes on the basis of the metadata.14. A method as claimed in claim 1 comprising rendering an image fromthe hierarchical structure using a raycasting process with spaceskipping, the space skipping being facilitated using the metadata.
 15. Acomputer-implemented method comprising: receiving, at a processor, astream of depth maps of the real-world environment captured by a mobileenvironment capture device, and also receiving at the processor aposition and orientation of the mobile environment capture deviceassociated with each depth map; calculating, from the depth maps, a 3Dmodel comprising values representing surfaces in the real-worldenvironment; storing in memory of a parallel processing unit the 3Dmodel; calculating an active region of the real-world environment usinga current position and orientation of the mobile environment capturedevice; mapping the active region to a working set of the memory;streaming values of the 3D model between the memory of the parallelprocessing unit and memory of a host device on the basis of the mapping.16. A method as claimed in claim 15 comprising storing the 3D model in ahierarchical structure at the memory of the parallel processing unit andusing compression criteria to select values of the 3D model to bestreamed out of the memory at the parallel processing unit.
 17. Anapparatus for constructing a 3D model of a real-world environmentcomprising: an input interface arranged to receive a stream of depthmaps of the real-world environment captured by a mobile environmentcapture device; a plurality of parallel processing units arranged tocalculate, from the depth maps, a 3D model comprising valuesrepresenting surfaces in the real-world environment; each parallelprocessing unit having a memory storing at least part of the 3D modelusing the same amount of memory and where the memory is mapped todifferent physical dimensions in the real-world environment for each ofthe parallel processing units.
 18. An apparatus as claimed in claim 17each parallel processing unit arranged to calculate the 3D modelindependently from the depth maps.
 19. An apparatus as claimed in claim17 wherein each of the parallel processing units represents a differentsized volume centered on a same position in the real world environment.20. An apparatus as claimed in claim 17 comprising calculating the 3Dmodel at the parallel processing unit representing a smallest volume andaggregating values from that parallel processing unit to fill the 3Dmodel at the other parallel processing units.